Improving Minority Class Prediction Using Case-Specific Feature Weights
نویسندگان
چکیده
This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an information-gain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibits poor performance on minority class instances. We then present two CBL algorithms designed to improve the performance of minority class predictions. Each variation creates test-case-speciic feature weights by rst observing the path taken by the test case in a decision tree created for the learning task, and then using path-speciic information gain values to create an appropriate weight vector for use during case retrieval. When applied to the NLP data sets, the algorithms are shown to signiicantly increase the accuracy of minority class predictions while maintaining or improving overall classiication accuracy.
منابع مشابه
Improving Minority Class Prediction Using Case-Speci c Feature Weights
This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an informationgain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibit...
متن کاملA Minimum Risk Metric for Nearest Neighbor Classification
nale. Retrieval in a prototype-based case library: A case study in diabetes therapy revision. CH97] C. Cardie and N. Howe. Improving minority class prediction using case-speciic feature weight. CS93] Scott Cost and Steven Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. DP97] Pedro Domingos and Michael Pazzani. On the optimality of the simple bayesian clas-si...
متن کاملImproving the Quality of Minority Class Identification in Dialog Act Tagging
We present a method of improving the performance of dialog act tagging in identifying minority classes by using per-class feature optimization and a method of choosing the class based not on confidence, but on a cascade of classifiers. We show that it gives a minority class F-measure error reduction of 22.8%, while also reducing the error for other classes and the overall error by about 10%.
متن کاملA Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique
Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance of identifying cancerle...
متن کاملEvaluation of Classifiers in Software Fault-Proneness Prediction
Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997